Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

نویسندگان

  • Kevin Duh
  • Akinori Fujino
  • Masaaki Nagata
چکیده

Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources, one can translate labeled data from another language, then train a classifier on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch. Various prior work have achieved positive results using this approach. In this opinion piece, we take a step back and make some general statements about crosslingual adaptation problems. First, we claim that domain mismatch is not caused by MT errors, and accuracy degradation will occur even in the case of perfect MT. Second, we argue that the cross-lingual adaptation problem is qualitatively different from other (monolingual) adaptation problems in NLP; thus new adaptation algorithms ought to be considered. This paper will describe a series of carefullydesigned experiments that led us to these conclusions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Co-Training for Cross-Lingual Sentiment Classification

The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine tr...

متن کامل

Combination of Multi-view Multi-source Language Classifiers for Cross-Lingual Sentiment Classification

Cross-lingual sentiment classification aims to conduct sentiment classification in a target language using labeled sentiment data in a source language. Most existing research works rely on machine translation to directly project information from one language to another. But cross-lingual classifiers always cannot learn all characteristics of target language data by using only translated data fr...

متن کامل

Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews

The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only available English resources for Chinese sentiment classification. We first investigate several basic meth...

متن کامل

Cross-lingual sentiment classification: Similarity discovery plus training data adjustment

The performance of cross-lingual sentiment classification is sharply limited by the language gap, which means that each language has its own ways to express sentiments. Many methods have been designed to transmit sentiment information across languages by making use of machine translation, parallel corpora, auxiliary unlabeled samples and other resources. In this paper, a new approach is propose...

متن کامل

Exploring Distributional Representations and Machine Translation for Aspect-based Cross-lingual Sentiment Classification

Cross-lingual sentiment classification (CLSC) seeks to use resources from a source language in order to detect sentiment and classify text in a target language. Almost all research into CLSC has been carried out at sentence and document level, although this level of granularity is often less useful. This paper explores methods for performing aspect-based cross-lingual sentiment classification (...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011